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Let (Xi, Yi),..., (X„, Y-n,) be an i.i.d. sample from a bivariate 
distribution function that lies in the max-domain of attraction of 
an extreme value distribution. The asymptotic joint distribution of 
the standardized component-wise maxima \Jf =1 X, and V"=i U 
then characterized by the marginal extreme value indices and the 
tail copula R. We propose a procedure for constructing asymptoti¬ 
cally distribution-free goodness-of-fit tests for the tail copula R. The 
procedure is based on a transformation of a suitable empirical process 
derived from a semi-parametric estimator of R. The transformed em¬ 
pirical process converges weakly to a standard Wiener process, paving 
the way for a multitude of asymptotically distribution-free goodness- 
of-fit tests. We also extend our results to the m-variate (m > 2) case. 

In a simulation study we show that the limit theorems provide good 
approximations for finite samples and that tests based on the trans¬ 
formed empirical process have high power. 


1 . Introduction. Let (X\, Yi),... , (X n ,Y n ) be an i.i.d. sample from a bi¬ 
variate distribution function (d.f.) F with marginal d.f.’s F\(x) = F(x, oo) 
and F 2 (y) = F( oo, y) for x, y G R. Suppose that F is in the max-domain of at¬ 
traction of some bivariate d.f. G with nondegenerate marginals. That is, sup¬ 
pose that there exist normalizing sequences ai (n ), 02 (n) > 0 and b\ (n), 62 (n) £ 
R such that 
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as n—> oo, for all continuity points (x, y ) € R 2 of G. Of course, (1) is equiv¬ 
alent to 

( 2 ) F n {a\(n)x + &i(ro),a 2 (n)y + b 2 (n)) G(x,y), 

and the d.f. G is, by definition, an extreme value d.f. 

It is a classical result in extreme value theory [see de Haan and Fer¬ 
reira (2006), Theorem 1.1.3] that the normalizing sequences a±,bi and 02,62 
can be chosen in such a way that the marginal d.f.’s G\(x) = G(x, 00 ) and 
G2{y ) = G(oo,y) are of the form 

Gi(x) = exp{ — ( 1 +7ix)^ 1 /°' 1 }, 1 + 71X > 0 , 

(3) 1 / 

G 2 {y) = exp{-(l +7 2 y) /72 }, l + 7 2 y>0 

for some 71,72 € R. [Here, and in the rest of the paper, expressions of the 
form (1 + 7 • ) 1 / 7 should be interpreted as exp(-) when 7 = 0.] We will assume 
throughout that the normalizing sequences are chosen in this way. Then 
G is necessarily continuous, as it has continuous marginal d.f.’s, and the 
equivalent convergences (1) and ( 2 ) hold for all (x,y) € [— 00 ,oo] 2 . Also, G 
can be fully characterized by the marginal extreme value indices 71 , 72 and 
a description of the dependence structure between the marginal d.f.’s G\ 
and G 2 . Due to de Haan and Resnick (1977), it is known that the class 
of possible dependence structures for bivariate extreme value distributions 
does not form a finite-dimensional parametric family. Nevertheless, there 
are various equivalent ways of describing extreme value (or tail) dependence 
structures, each with its own advantages in applications. For an overview, 
we refer to Beirlant et al. (2004), Chapter 8 or de Haan and Ferreira (2006), 
Part II. 

In this paper, we will focus on one possible description of the bivariate 
tail dependence structure, namely the tail copula. For a bivariate extreme 
value d.f. G with marginal d.f.’s as given in (3), the tail copula R is defined 
as 

/ 3.-71 _ 1 1,-72 _i\ 

(4) R(x,y) = x + y + logG[ -,- , (x, y) € [0,oo) 2 . 

V 7i 72 / 

We say that a bivariate d.f. F belonging to the domain of attraction of 
G has associated tail copula R. It is clear that tail copulas are not cop¬ 
ula functions in the usual sense (since they are not distribution functions 
of probability measures, e.g.), yet they fully capture the asymptotic depen¬ 
dence structure of the component-wise maxima, just like copulas capture 
the dependence structure of random vectors. Indeed, it is easily checked 
that G(x,y) = C g (Gi(x), G 2 (y)), with 

(5) Cg(u,v) = uvexp{R(— logu, — logn)}, (u, v) € (0, l] 2 . 
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In other words, G is the unique d.f. characterized by the marginal d.f.’s (3) 
and the copula (5). 

We conclude that the asymptotic joint behavior of the standardized 
component-wise maxima V "=1 and V"=i Yi i s fully characterized by the 
marginal extreme value indices 71,72 appearing in (3) and the tail cop¬ 
ula 72 defined in (4). Statistical inference about extreme value indices is a 
classical and well-studied problem in univariate extreme value theory; we 
refer to Beirlant et al. (2004), Chapters 4 and 5 or de Haan and Ferreira 
(2006), Chapter 3 for more information. There is also a growing literature 
on inference about the tail dependence structure; see Beirlant et al. (2004), 
Chapter 9 or de Haan and Ferreira (2006), Chapter 7, for an overview. In 
this paper, we will focus on inference about R. In particular, we will propose 
a semi-parametric estimator of 72, describe a transformation of the empiri¬ 
cal process derived from it and demonstrate how this transformed empirical 
process can serve as a basis to construct asymptotically distribution-free 
goodness-of-fit tests for R. 

1.1. More on tail dependence. The tail copula R can also be obtained 
(and its domain extended) in the following way from the d.f. F : 

R(x,y) = lirn tP( 1 - F 1 (X) < x/t, 1 - F 2 (Y) < y/t ), 

, . t —>00 

( 6 ) 

(x,y) € [ 0 , oo ] 2 \ {( 00 , 00 )}, 

where (X , Y) denotes a random vector with d.f. F. If F has continuous 
marginals, ( 6 ) can also be written as 

(7) R(x,y)= lhntC F (x/t,y/t), (x,y) G [0,oo ] 2 \ {( 00 , 00 )}, 

t —^OO 

where Cf denotes the “survival copula” of F, that is, the copula associated 
with (— X, — Y). Observe that R(x, 00 ) = R(oo,x) =x for all x£ [0,oo) and 
0 < R(x,y) < xAy for all (x,y) € [0, oo ] 2 \ {( 00 , 00 )}. It is also clear from ( 6 ) 
that R is homogeneous of order 1, so the restriction of R on, for example, 
[0, l ] 2 determines R on its entire domain. The characterization ( 6 ) stems 
from Huang (1992), where it is used to derive a nonparametric estimator for 
R. We will use an alternative, semi-parametric estimator better suited for 
our purposes; see Section 2. 

The value 72( 1,1) is known in the applied extreme value literature as the 
(upper) tail dependence coefficient and is widely used as a measure of tail de¬ 
pendence. When 72(1,1) = 0, which is equivalent to 72 = 0 on [0,oo) 2 , we call 
X and Y tail independent. When 72(1,1) > 0, we say that X and Y exhibit 
tail dependence. Other ways of describing the tail dependence structure in¬ 
clude the stable tail dependence function , the exponent measure , the spectral 
measure and the Pickands dependence function ; see the monographs Kotz 
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and Nadarajah (2000), Beirlant et al. (2004), de Haan and Ferreira (2006) 
and the many references therein. 

We also note here that the function R generates a a-finite measure, which 
we will also, without confusion, denote by R, on Borel subsets of [0,oo] 2 \ 
{(oo,oo)}, through the identity 

(8) R([ 0 ,x]x[ 0 ,y]):=R(x,y), (x, y) € [0, oo] 2 \ {(oc, oo)}. 

1.2. Goodness-of-fit testing. In the literature and in practice, often a 
parametric model is used for the tail copula i?; see, for example, Coles and 
Tawn (1991) or Joe, Smith and Weissman (1992). Testing the goodness-of-fit 
of the parametric model to a given data sample is therefore an important 
problem with abundant applications in many fields such as insurance and 
risk management, finance and econometrics and hydrology and meteorol¬ 
ogy. In this paper, we develop a procedure for constructing asymptotically 
distribution-free goodness-of-fit tests for the tail copula R of a bivariate d.f. 
F. We consider null hypotheses of the form R^7Z = {Rg : 9 £ ©}, where TZ 
is a parametric family of tail copulas. Of course, by taking the parameter 
space 0 to consist of a single point, our results can also be used to test the 
goodness-of-fit of a fully specified tail copula to the data. 

Our approach is based on a semi-parametric estimator R n of R, to be 
defined below. We consider a suitably normalized difference, rj n , between 
R n and R g (with 9 denoting a suitable estimator of 9), and we show that, 
under the null hypothesis, a proper transformation of rj n converges weakly 
to a standard Wiener process W. This fundamental result allows one to 
construct a myriad of goodness-of-fit tests based on comparisons of appro¬ 
priate functionals of rf n (the test statistics the practitioner may prefer to 
use) with the same functionals of W. We emphasize that, since IT is a stan¬ 
dard Wiener process, our approach leads to asymptotically distribution-free 
goodness-of-fit tests: under the null hypothesis, the asymptotic distributions 
of the test statistics do not depend on TZ or the true 9. A simulation study 
confirms the applicability of our approach for finite samples. 

Testing (and estimation) problems for the tail copula have been stud¬ 
ied in the recent literature. In Einmahl, de Haan and Li (2006) the ex¬ 
istence of R. is tested, rather than its membership of a parametric fam¬ 
ily. In de Haan, Neves and Peng (2008) a specific Cramer-von Mises type 
statistic for R £ {Rg : 9 £ 0} is studied for two-dimensional data and a one- 
dimensional parameter; the test statistic has a complicated limiting distri¬ 
bution under the null hypothesis. In Einmahl, Krajina and Segers (2012) it 
is assumed that R £ {Rg : 9 £ 0}, and it is then tested if R is a member of a 
smaller parametric family, obtained by setting some components of 9 equal 
to fixed values. 
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The remainder of the paper is organized as follows. In Section 2, we de¬ 
scribe the semi-parametric estimator R n , introduce the empirical process 
rj n , which is the normalized difference between R n and and describe the 
weak limit rj of rj n as n -A oo. In Section 3, we describe our key transfor¬ 
mation from rj into a standard Wiener process. In Section 4, we show that 
the same transformation (or rather an empirical version of it, with unknown 
parameters replaced by estimators) applied to rj n produces a process whose 
weak limit is a standard Wiener process. This is our main result. In Sec¬ 
tion 5, we extend this result to the m-dimensional setting, for m > 2. Finally, 
in Section 6, we demonstrate through Monte Carlo simulations the applica¬ 
bility of our limit theorems in finite samples and the high power properties 
of tests based on our results. Proofs are deferred to Section 7. The paper is 
supplemented by an online appendix, see Can et al. (2015), which contains 
some details suppressed in Section 2 as well as technical specifics about the 
Monte Carlo simulations, including the computer code. 

2. An estimator for Ft and its asymptotic behavior. As in Section 1, we 
let (Ai, Yi),..., (X n , Y n ) denote an i.i.d. sample from a bivariate d.f. F with 
marginal d.f.’s F\ and 1*2• We assume that the bivariate domain of attraction 
condition (1) holds, with the normalizing sequences ai,&i and a 2 ,b 2 chosen 
such that the marginal d.f.’s G\ and G 2 are as in (3). Taking logarithms 
in (2), and replacing the discrete index n by a continuous index t > 0, we 
obtain 


lim t[ 1 - F(ai(t)x + b 1 (t),a 2 (t)y + b 2 (t))] = - log G(x,y), (x,y) € M 2 . 

t —^OO 

Combining this with the corresponding marginal results and (5) leads to 
lim tP(Xi > ai(t)x + bi(t),Yi > a 2 (t)y + b 2 (t)) 

t—>00 

= R(- log Gi ( x ), - log G 2 {y )), 

or equivalently, 


lim tP(X 1 (t) < x , Yi (t) <y) = R(x,y) 


with 



( 9 ) 



for i = 1,... ,n. We conclude that if we let k = k{n) denote an intermediate 
sequence, that is, k —>• 00 and k/n A 0 as n A 00 , then 


(10) 


ft ~ ■—- 

Rn(x,y) := - P{Xi(n/k ) < x,Yi(n/k) <y)^r R(x,y) 
k 
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as n —> oo, for all (x, y ) € [ 0 , oo) 2 . 

We estimate R n and hence R by replacing the unknown quantities a,j(n/k), 

bj(n/k ) and jj, j = 1 , 2 , by appropriate estimators aj(n/k ), bj(n/k ) and 7 'j, 
and the probability P by the corresponding empirical measure. We define, 
therefore, 


( 11 ) 


Xi{n/k) 

Yi(n/k) 


(l+7i 
^1 + 72 


Xj -bi(n/k) \ 

ai(n/k ) J 


Yj -b 2 (n/k) \ 

a 2 (n/k) J 


VO 


-1/71 


VO 


-1/72 


and 


( 12 ) 


Rn(x,y) 


1 

k 


n 

XT ^~{Xi(n/k)<x,Yi(n/k)<y} 
1=1 


for (x,y) € [0,oo) 2 ; cf. de Haan and Resnick (1993). 

We consider the empirical process 

(13) rj n (x,y) = Vk[R n (x,y) - R(x,y)] : (x, y) € [ 0 , oo) 2 . 

We will establish the asymptotic behavior of rj n on [5, T] 2 , for any 0 < 5 < 
T < 00 , but we introduce some definitions and assumptions first. Note that 
from now on we will omit the arguments ( n/k ) where appropriate, for ease 
of notation. 

Let Vf{(x, y) denote a Wiener process on [0,oo ] 2 \ {(00,00)} with “time” 
R, that is, a zero-mean Gaussian process with covariance 

E[V r (x, y)V R (x', y')] = R{x Ax',j/A y'). 

Also write [cf. (10)] 

1 n 

(14) T n{x:y) = ^Y. 1 {x,<x, %< y y (x,y)e[ 0 ,oo) 2 . 

i= 1 


It is known, by Einmahl, de Haan and Sinha (1997), Lemma 3.1, that 
y/k{T n — Rn) =>• Vn in D([5, T] 2 ), where “=>” denotes weak convergence and 
D([5,T] 2 ) denotes the Skorohod space of functions defined on [S,T] 2 . 

In order to leave the estimators cij, bj and 7 j, j = 1,2, general at this 
stage, we simply assume that they are chosen in such a way that: 


Al. For some 6 -variate random vector {Ai 1 A 2 ,Bi,B 2 ,Ti,T 2 ), we have 
the joint weak convergence 


(15) 


Vk \ T n — Rn, — — 1 , — — 1 , 

ai a 2 


a± a 2 bi~ bi b 2 - b 2 


a 1 


a 2 


-,7i -7i)72 -72 


(Yri 2I1, a 2 , B\, b 2 i Ti, r 2 ) 
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in D([5,T} 2 ) x M 6 . 

Assumption Al is fulfilled for, for example, the moment estimators of 7 j, 
dj and bj , provided that k is chosen appropriately; see de Haan and Ferreira 
(2006), Sections 4.2 and 3.5. We further assume the following: 

A2. The partial derivatives 

i?(1)(x,y):= ^’ E ( 2 )( x ’ y ) := |“ 

exist and are continuous on ( 0 , oo) 2 . 

A3. The sequence k is chosen such that 

Vk sup \R n (x,y) - R(x,y)\ -A 0 . 

(x,y)e[5/2,T+i\2 


Finally, for j = 1,2, we define the following functions on (0,oo): 


(16) 



r x(x — 1 ) 

7 i + °> 


fj ( x ) = i 

71 



_ xlogx, 

II 

0 


Sj(x) = - 

-x lj+1 , 



hj(x )= < 

" x(l — xTi) ^ X 
2 

logx 

lj + °> 


7 3 


„ -(xlog 2 x)/2, 


II 

0 


We are now ready to state the basic convergence result for rj n . 


Theorem 2.1. Let 0 <5 <T < 00 . If assumptions A1-A3 hold, then 
T] n (x,y)^ V R (x, y) + R(i) (x, y)[fi{x)A 1 + g x ( x)B 1 + hi(x)Ti] 

(17) + R [2) {x,y)[f 2 {y)A 2 + g 2 (y)B 2 + h 2 {y)V 2 ] 

=■■ v{x,y) 

in D([5, T) 2 ). 


Remark. Note that we take 6 > 0, since the result does not hold true in 
general for 5 = 0 : the functions in (16) are unbounded near zero for 7 j < — 1 . 
This theorem is very similar to Theorem 5.1 in de Haan and Resnick (1993), 
where instead of R the stable tail dependence function l(x,y) = x + y — 
R(x,y) is estimated. We nevertheless offer a detailed proof of Theorem 2.1 
in Can et al. (2015), since the statement and proof of Theorem 5.1 in de Haan 
and Resnick (1993) are not completely correct; in particular, our 5 is taken 
to be 0 there. 
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2.1. Parametric empirical process. Now suppose that the tail copula R 
is a member of some parametric family of tail copulas, 1Z = {R g '■ 0 € 0}, 
where 0 is an open subset of W l . Then there is a 0o = (0 qi, •.., dod) T £ 0 
such that R = Rq 0 . Let 0 = (6\ ,...,0^) T denote an estimator of 0q, and 
consider the empirical process 

(18) rj n (x,y) = \/k[R n (x,y)-R-g(x,y)], (x, y) € [0, oo) 2 , 

the parametric version of (13). Our next result will establish the asymptotic 
behavior of rf n . Since 

(19) Vn(x,y) = rin(x,y) + Vk[Rg 0 (x,y) - R$(x,y)\, 

the asymptotic behavior of rj n is an easy consequence of Theorem 2.1, under 
proper assumptions. We state those assumptions below. 


Bl. There is a (6 + d )-variate random vector (Ai,A 2 ,-Bi,-E? 2 ,ri,r 2 ,C) 
such that 


( 20 ) 


m rr T? 1 “2 & 1-&1 b 2 ~b 2 ^ ~ Q 2 

VK T n - R n , - 1,- 1,-,-,71-71)72-72,00 - 0 

a\ a 2 a,\ «2 


=7 (Vr, Ai, A 2 , Bi, B2 ,Ti,T 2 , C) 

in D([6,T ] 2 ) xK w . 

B2. The first-order partial derivatives 

d d 

Ro(i )(x,y) = g^Ro{x,y), R 9 ( 2) (x,y) = —R g (x,y), 

f d d 

Rg{x,y)= \—R e {x,y ),..., —R 0 (x,y) 


exist and are continuous for (x,y,0) € (0,oo) 2 x B{0 q), for some neighbor¬ 
hood B(0 o) of 0o in 0. 

B3. The sequence k is chosen such that 

(21) Vk sup \R n (x,y) - Rg 0 (x,y)\ 0. 

(x,y)e[6/2,T+ 1] 2 


Note that B3 is the same as A3; we restate it here for ease of presentation. 
Also note that by virtue of B2 the second term on the right-hand side of 
(19) is asymptotically equal in probability to 

Re Q {x 1 y)Vk(0 o-0), 

which, by Bl, converges weakly to RJ )q (x, y)C- Thus we obtain the following 
corollary to Theorem 2.1. 
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Corollary 2.2. Let 0 < 5 <T < oo. If assumptions B1-B3 hold, then 
Vn{x,y) =>- V Rgo (x,y) + R 0o ^(x,y)[fi(x)A 1 + gi(x)B 1 + hiix^i] 

+ Ro 0 ( 2 )(x,y)[f 2 (y)A 2 + g 2 (y)B 2 + h 2 (y)T 2 ) 

+ Ro 0 (x,y)C 

= : rf(x,y ) 

in D{\5,T) 2 ). 


( 22 ) 


3. Transforming rj into a standard Wiener process. The limiting process 
rj in (22) is of the general form 

V 

(23) f{x,y) = V R {x,y) + 'YjQ j (x,y)Z j , 

3 = 1 

where V R denotes a Wiener process with time R, v is a fixed integer, 
Qi,..., Q u are deterministic functions mapping [S,T] 2 into R and Z\, ..., Z v 
are random variables. 

It will be more convenient to consider the set-indexed version of (23), 

V 

(24) i(B) = V r (B) + ]T Qj{B)Zj =: V R (B) + Q T (B)Z, 

3 = 1 

where B is a Borel subset of [<5, T] 2 , V R is a set-indexed Wiener process 
with time measure R and Q i,..., Q u are deterministic signed measures. In 
the right-hand side of (24), Q (B) denotes the column vector consisting of 
Qi(B), ..., Q U (B) and Z denotes the column vector consisting of Z\, ..., Z v . 

We will state a general transformation result about set-indexed processes 
£ of the form (24), which we will then apply to the process fj in (22). The 
transformation is a suitable extension of the “innovation martingale trans¬ 
form” first discussed in Khmaladze (1981, 1988, 1993) in connection with 
parametric goodness-of-fit testing for univariate and multivariate distribu¬ 
tion functions; see, in particular, Khmaladze (1993), Theorem 3.9. A good 
summary of the innovation martingale transform idea can be found in Koul 
and Swordson (2011); for a variety of statistical applications we refer to 
McKeague, Nikabadze and Sun (1995), Nikabadze and Stute (1997), Stute, 
Thies and Zhu (1998), Koenker and Xiao (2002, 2006), Khmaladze and Koul 
(2004, 2009), Delgado, Hidalgo and Velasco (2005) and Dette and Hetzler 
(2009), among others. 

As in Khmaladze (1993), we will call a collection of subsets {A u :0 < u < 
1} of [5,T] 2 a scanning family over [d, T] 2 if the following hold: 

(i) Leb(A 0 ) = 0,Leb(Ai) = (T-5) 2 , 
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(ii) A u C A u > if u<u', 

(iii) Leb(^4 n / \ A u ) —>• 0 if u' u, 

with Leb denoting Lebesgue measure. Note that for any j G {1 ,,is} and 
Borel subset B of [5, T] 2 , the function u eA Qj(B n A u ) generates a signed 
measure on [0,1]. 


Theorem 3.1. Let f be a set-indexed process of the form (2f). Suppose 
there are functions qj : [<5, T] 2 —>-R, 1 < j < v that are square-integrable with 
respect to R and that satisfy 


Qj(B) 



qj(x,y) d R(x,y), 


1 <j<", 


for any Borel set B C [5,T] 2 . Let {A u : 0 < u < 1} be a scanning family over 
[5,T] 2 . Then the process 

(25) W r (B)=£(B)- f 1 Q T (BnA du )I- 1 (A c u ) ff q(x,y) d£(*,y) 

J 0 J J Ac 


is a Wiener process with time R, where q(x, y) denotes the column vector 
consisting of qi(x,y),... ,q u (x,y), and the matrices I(.A£) are defined by 


I(^) 



A{x,y)q L T (x,y) d R(x,y), 


and are assumed to be invertible. 


u € [0,1) 


Now let us return to the setup of Section 2.1. We state the following 
assumption. 

(c) 

B4. For each 6 G 0, the measure Re can be decomposed as Rq = R y e + 

Rg\ where R ^ satisfies Rg\[ 0 ,oo) 2 ) = 0 and R'ff is absolutely contin¬ 
uous with respect to the Lebesgue measure on (0,oo) 2 , with a positive 
density re that has continuous first-order partial derivatives with respect 
to x, y, 0\,..., 0d for all ( x,y,9) € (0,oo ) 2 x B(G o), for some neighborhood 
B(6q) of 0o in ©• 

Note that B4 allows arbitrarily large masses on the “axes at infinity” 
{(x,oo) :x > 0} U {(oo,y) :y > 0} for Rq G R, but excludes the case Rg = 

Rq^ , which corresponds to (strict) tail independence. 

Let us define the following functions on [5,T] 2 , with fj, g 3 and hj as dehned 
in (16): 

Qi{x,y) = Ro 0 {i)(x,y)fi(x), Qi{x,y) = Rg 0 ( 2 )(x,y)f 2 (y), 

Q 2 (x,y) = Rg 0 (i)(x,y)gi(x), Q 5 (x, y) = R 0Q ( 2 )(x, y)g 2 (y), 

Qz{x,y) = R 0o{1) (x,y)h 1 (x), Qe(x,y) = R e ^ 2 ){x,y)h 2 (y) 
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and 


Qe+i(x,y ) 


W, Rdx ’ y) 


0=0 o 


i = 1 ,..., d. 


Furthermore, let qj denote the Radon-Nikodym derivatives dQi/dRg 0 for 
i = 1 ,..., 6 + d, or more explicitly: 


and 


d 

Qi(x,y) = f[{x) + fi(x)—\ogrg 0 (x,y), 

3 

<? 2 (x,y) = g x (x)+g i (x) — log rg 0 {x,y), 

3 

qz{x,y) = h[(x) + h l (x)— logrg 0 (x,y), 

3 

<u(x,y) = ftiiy) + f 2 (y)-^logrg 0 (x,y), 

3 

q 5 {x,y) = g' 2 (y) + g 2 (y ) ^ log rg 0 (x,y), 

3 

Q6(x,y) = h' 2 (y) + h 2 {y)— logr 0 o (x,y) 


Q6+i(x, y) 


_d_ 

d9i 


log r e (x,y) 


0=Oo 


i = 1 ,..., d. 


As before, q (x,y) will denote the column vector consisting of qi(x, y ),..., 
Q6+d(x,y) for (x,y) € [5,T ] 2 . 

We are now ready to apply Theorem 3.1 to rj in (22). Instead of arbitrary 
Borel sets B, we consider rectangles [d, x\ x [5,y\ C [<5, T] 2 , with 


g([5, x] x [5, y\) := rj{x, y) - r}{5, y) - r}{x, 5) + rf(5,6). 


We also introduce the scanning family A u = [ 5,T} x [5, (1 — u)5 + uT] for 
0 < u < 1 and define the corresponding matrices 


(26) 


1(f) 



q(s', f')q T (s / , t') d Rg 0 (s', f'), 


te[S,T). 


Remark. From a likelihood theory point of view, the functions q\,, 
qs+d can be seen as score functions corresponding to the estimated values 
01 , 02 , 61 , 62 , 71 , 72 ,^ 01 , • • •, #od, an d the matrix 1 (f) can be seen as a partial 
Fisher information matrix constructed from these score functions. 
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Corollary 3.2. If assumptions B2 and B4, restricted to 6 = 9q, hold, 
and the matrices I (t) in (26) are invertible, then the process 

W r {[8,x\ x [5,y]) 

= v([S,x\ x [<5,y]) 

-// q [ [ q(s',t')drf(s',t')\dR 0o (s,t) 

Js Js V Js Jt J 

is a Wiener process with time Rg 0 on [5,T} x [5, T). 

In order to obtain a standard Wiener process from rf, we normalize Wr 
in the usual way, as follows. 


Corollary 3.3. If assumptions B2 and B4, restricted to 6 = 6hold, 
and the matrices I (t) in (26) are invertible, then the process 

W([5,x] x [<5,y]) 


(27) 


: rr j— dw R {[6,8]*M) 

Js Js v r e 0 (s,t) 

n y 1 

- r ==dy{s,f) 

V r o 0 {s,t) 

j_ q T (s,t)( I_1 (t)^ ^ q(s', t') d7?(s', ^( s > *) ds 


is a standard Wiener process on [6,T] x [6,T). 


4. Goodness-of-fit testing. In Section 2 we introduced the parametric 
empirical process rj n as the normalized difference between R g and the semi- 
parametric estimator R n , and derived its weak limit rf. In Section 3 we 
described a transformation from fj into a standard Wiener process W. In 
this section, we will apply the empirical version of the same transformation 
to rf n , and prove that the resulting empirical process converges weakly to a 
standard Wiener process. This is the main result of this paper. 

Define the empirical version of W in (27) as follows, for ( x,y ) G [6,T] x 
[5,T): 


W n ([6,x] x [<5,y]) 

n y i 

= d rf n (s,t) 

^ r g(s,t) 

- n\ T (s,t)(i~\t) f f ^s',t’)dy n {s',t')\ 
Js Js \ Js Jt J 



(s,t)dtds. 
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Here, the vectors q and the matrices I are obtained by replacing the unknown 
marginal tail indices 71,72 and the unknown parameter 6 0 in the definition 
of q by their estimators 71 , 72 , 0 . 

For functions p : [<5, T] 2 -7 M, we introduce the seminorm 


IMIhk : = y( 2 ) (y,) + y(i)(<^( •, 6)) + V™ •)) 

(29) 

+ F«(p(-,T)) + yWMT,.)), 

where denotes the univariate total variation over [5, T ], and denotes 
the bivariate (Vitali) total variation over [5,T] 2 , as defined in Owen (2005), 
for example. The seminorm || • ||hk is sometimes called the Hardy-Krause 
variation in the literature, in recognition of Hardy (1905) and Krause (1903). 
For notational convenience, let us also denote 


d 

Pi(x,y,6) = — log 7 - 0 ( 3 ;, y), 

d 

P2+i(x,y 1 0 ) = —logr g (x,y), 


and 


P 2 {x,y,d) = log r 0 (x,y), 
dy 

i = 1 ,..., d 


A-Pj{x,y) = pj(x,y,0) — pj(x,y,Oo), j = l ,...,2 + d. 


Similarly, let 

v{x,y,0) =r g (x,y)~ 1/2 , Acr(x,y) = a(x,y,6) - cr(x,y,6 0 ). 


We introduce the following assumption: 


B5. For j = 1 ,... ,2 + d, \\pj(x,y,d 0 )\\uK < 00 and ||A / o j (x,2/)|| H k = o P ( 1 ). 
Furthermore, ||cr(cc, 7/, 0 O )||hk < 00 and || Aa(x, y)|| H K = op(l). 

Given the consistency of 6, which is implied by Bl, a sufficient (but not 
necessary) condition for B5 is the existence and continuity of the partial 
derivatives 

dip(x,d,9) dtp(x,T,9) dtp(5,y,6) dy{T,y,9) d 2 y(x,y,0) 
dx ’ dx ' dy ’ dy dx dy 

on ( x,y,6 ) € [5,T] 2 x B{ 6 0 ), for some neighborhood B{6q) of 6q in 0, for 
ip = cr and tp = pj , j = 1 ,..., 2 + d. 

We can now present the main result of this paper. 


Theorem 4.1. Let 0 < S < t < T , and let W and W n be defined as in 
(27) and (28). If assumptions B1-B5 hold, then 

W n {[S,x] x [5,y]) =7 W([5,x\ x [5,y]) 

in D([S, t] 2 ). 
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Note that Theorem 4.1 yields that under the null hypothesis R&1Z, we 
obtain a distribution-free limiting process W (a standard bivariate Wiener 
process). Hence W n can be used as a “test process” for producing a myriad 
of asymptotically distribution-free test statistics to test this null hypothesis. 
We will consider examples of such tests in Section 6 . 

Remark. By taking 1Z = {Ro}j where Rq is a fully specified tail copula, 
we can use Theorem 4.1 for testing the null hypothesis R = Rq. In this case, 
the process rj n in the definition of W n [see (28)] reduces to rj n as defined in 
(13), rg reduces to ro = dRo/dLeb and q and I are determined by ro and 
7 i, 72 - We will consider an example of testing R = Rq in Section 6 . 


5. Multivariate extension. In this section we extend Theorem 4.1 from 
the bivariate to the m-dimensional setting, for m > 2. The proof will be 
omitted, but it follows very similar lines as in the bivariate case. In partic¬ 
ular, Theorem 3.1 immediately generalizes to dimension m and then serves 
as a basis for the main result of this section. 

So suppose that we have an i.i.d. sample Xi,...,X n from an m-variate 
d.f. F with marginal d.f.’s F\,.... F rn . We write, for each i€ {l,...,n}, 
X,; = (Xu,... ,Xj m ) T , where Xij has d.f. Fj. We assume that F is in the 
max-domain of attraction of an m-variate extreme value d.f. G, so there 
exist normalizing sequences «i(n),..., a m (n ) > 0 and &i(n),..., b m {n) € R 
such that 


p(V"=l*, 

V a u n ) 


Vi=l Xim b m (n) 
a m (n) 



—> G(x), 


with x = (xi,..., x m ) T € R m . We assume, as in the bivariate case, that the 
sequences a 3 and bj, j = l,...,m, are chosen in such a way that G has 
marginal d.f.’s of the form 

Gj(x) = exp{-(l + 7 jx)^ 1//7j '}, 1 +7 jX > 0, 

for some 71 ,... , 7 m € R. We will denote 7 = ( 71 ,... , 7 m ) T - The d.f. G is then 
characterized by the marginal tail indices 7 and the m-variate tail copula 


R(x) 


lim tP 

t—> OO 


(fl {l-Fjix^Kxj/tyy 


x e [ 0 ,oo] m \ { 00 }, 


where cxd denotes the point (00,..., 00). 


Remark. In the remainder of this section we consider R defined on the 
restricted domain [0, oo) m [cf. (4)] because our processes and transformations 
are not defined outside this region. The bivariate tail copula R defined on 
[0, oo ) 2 determines R on the full domain [0, oo ] 2 \ {(00,00)}. In contrast, for 
m > 2 the tail copula R defined on [0, oo) m in general does not determine R 
on the full domain [ 0 , oo] m \ { 00 }. 
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Let 1Z = {Rg : 6 E 0} denote a parametric family of m-variate tail copulas 
on [0,oo) m , parametrized by 6 = {6 \,... ,#d) T € 0, an open subset of M. d . 
Our aim is to enable the construction of tests for the null hypothesis R^IZ 
against the alternative R^IZ. 

For fixed 6 € 0, Rg can be seen as an equivalence class of tail depen¬ 
dence structures (i.e., tail copulas defined on the full domain) containing 
one or more elements. Under the additional assumption that Rg puts no 
mass on [0,oo] m \ ({oo} U [0,oo) m ), Rg contains exactly one element (as in 
the bivariate case). 

Suppose the null hypothesis holds true, with R = Rg 0 , for some 6 o € 0. 
Let 6 denote an estimator for 6q. As in Section 2, we let k = k(n ) denote 
an intermediate sequence and define the parametric empirical process 

??n(x) = s/k[R n {yi) - .Rg(x)], x€[0,oo) m , 

where 

1 n 

R n (x.) = {Xij{n/k)<xjy X G I 0 ’ 00 )™’ 

1=1 

with Xij(n/k),(i,j) € {l,...,n} x {1 , ...,m}, defined similarly as in (11). 
Let R n and T n denote the obvious m-variate extensions of (10) and (14), 
let 0 < 5 < T < oo and let C1-C4 denote the natural m-variate extensions 
of assumptions B1-B4 of Sections 2 and 3. 

To state the analog of assumption B5 for the m-variate case, we extend 
the seminorm (29) to m-variate functions by induction, as follows: For any 
function (p: [<5, T] m -A M, and i G {1,..., m}, we define ps.-i, ■ [5, T] m_1 -A R to 
be the restriction of (p to the subset of [5,T] m with the ith coordinate fixed 
at 5, and we define (px,i analogously. Then we let 

m m 

(30) MlS := yi*'H + E IIWillmT 11 + E IIw.iIIhT 1 *. 

i= 1 i=l 

with l-d” 1 ) denoting the m-variate (Vitali) total variation over [5,T] m and 

II^IIhk as defined in (29). We also let pj, A pj,a, Ac be defined as in Section 4, 
for j = 1, ..., m + d. 

C5. For j = 1,... ,m + d, ||pj(x, 0 O )||^ < oo and ||A/j j (x)||^ ) = o P (l). 
Furthermore, ||<r(x, 0 o)Hhk < 00 an d ||Aa(x)||^ =o P ( 1). 

Now, let us introduce the functions Qj and qj = dQj/dRg 0 , for j = 
1,..., 3m + d, as the natural m-variate extensions of the bivariate func¬ 
tions introduced before Corollary 3.2, and let us denote by q(x) the col¬ 
umn vector consisting of gi(x),..., g 3m +(i(x). Further, let us write [<5,x] = 
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[5,x\] x • • • x [5,x m ], St = [5,T] m 1 x (t,T], and introduce matrices 

I(*)= / q(s)q T (s)di? 0o (s), te[5,T), 

Js t 

which are assumed to be invertible. Then the m-variate analog of the trans¬ 
formed empirical process W n in (28) is 



where q and I are obtained by replacing 7 and 6 0 by 7 and 6 in the definition 
of q. 

We are now ready to state the multivariate analog of Theorem 4.1. As in 
the bivariate case, this result can be used as a basis for producing a multi¬ 
tude of asymptotically distribution-free goodness-of-fit tests for a parametric 
model TZ (as well as for a fully specified tail copula Rq). 

Theorem 5.1. Let m > 2. Furthermore, let 0 < 6 < r <T, and let W n 
be defined as in (31). If assumptions C1-C5 hold, then 

W n ([<5,x])^W([<5,x]) 

in D([5, r] m ), where W is a standard m-variate Wiener process. 

6. Simulation study. In this section we consider some specific functionals 
of W n under the null and alternative hypotheses, for three bivariate models 
TZ. We will see in Monte Carlo simulations that under the null hypothesis 
our limit theorems yield good approximations for finite sample size n, and 
we also find that the resulting tests have good power properties. This shows 
the applicability of our method. 

The three models we consider are the following: 

Model 1. R(x, y) = x + y — \J x 2 + y 2 ; 

Model 2. RgTZ = {Rq : Rg(x, y) = x + y — (x 1//0 + y 1 ^ 9 ) 9 ,0 € (0,1)}; 

Model 3. R^TZ = {R^ : R^(x, y) = ip{x + y — \Jx 2 + y 2 ),ip € (0,1)}. 

Model 2 is the widely used logistic family of tail copulas. Model 1 is a 
fully specified tail copula and a special case of Model 2. Model 3 is a mixture 
between Model 1 and the tail independence model (R = 0). Note that the tail 
copulas of Model 3 assign mass to the axes at infinity; indeed, the parameter 

determines how much mass is assigned there. 
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For each model, we first generate 300 samples of size n = 1500 from a “null 
hypothesis d.f.” Fq for which the model is correct. We use these samples to 
assess the finite-sample performance of our main convergence result, The¬ 
orem 4.1. Next, we generate, for each model, 100 samples of size n= 1500 
from an “alternative hypothesis d.f.” F a for which the model is incorrect. 
These samples are used for power calculations. 

In Section 6.1 below, we present the data generating distributions used for 
each model. Then in Section 6.2 we describe our simulation results. Addi¬ 
tional details about the simulations, including the verification of assumptions 
and the computer code that was used, can be found in Can et al. (2015). 

6.1. Data generating distributions. To test for Models 1 and 2 under the 
null hypothesis, we generate samples from the bivariate Cauchy distribution 
on the positive quadrant with density 

(32) / 0 (x, y) = 2 , (x, y ) € [0, oo) 2 . 

7 r(l + x A + y A ) 6 / z 

This distribution satisfies Model 1, and therefore also Model 2, with 9 = 1/2. 

To test for Model 3 under the null hypothesis, we sample from the bivari¬ 
ate mixture random vector 

(33) (IX 1 + (1-I)X 2 ,IY 1 + (1-I)Y 2 ), 

where I, (X 2 , 12 ) are independent, I ~ Bernoulli(0.75), (X 1 , Ti) has 

the bivariate Cauchy distribution (32) on the positive quadrant and (X 2 , 12 ) 
is a pair of standard Cauchy absolute values coupled by the countermonotonic 
copula. Since {X\. Y \) has the Model 1 tail copula and (Wn I 2 ) has tail in¬ 
dependence, mixture (33) has the Model 3 tail copula with ^ = 0.75. 

To test for Model 1 under the alternative hypothesis, we sample from a 
mixture random vector as in (33), where I , (X\, Y\), (X 2 , Y 2 ) are independent 
and I ~ Bernoulli(0.75) as before, but (X\. Y \) has a bivariate logistic d.f. 
with Frechet marginals, 

(34) F(x, y) = exp{—[(1 + a;) -4 + (1 + y) -4 ] 1/4 }, (x, y) € (-1, oo) 2 , 

and (X 2 .Y 2 ) has identical marginal d.f.’s as in (34), coupled by the counter¬ 
monotonic copula. The resulting d.f. has the tail copula 

R(x,y) = 0.75[x + y- (x 4 + y 4 ) 1/4 ], (x,y) € [0,oo) 2 . 

To test for Model 2 under the alternative hypothesis, we sample from the 
bivariate vector 

(35) (AZi + (1 — A)^2, y,Zi + (1 — /j,)Z2), 

where Z\ and Z 2 denote independent standard Pareto random variables, 
and \,y, G (0,1) are deterministic coefficients. We set A = 0.95, y = 0.65 for 
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the simulations. The random vector (35) is a simple example of the linear 
factor model, with associated tail copula 

R{x, y) = min{Ax, yy} + min{(l - X)x, (1 - y)y}, (x, y) € [0, oo) 2 . 

Finally, to test for Model 3 under the alternative hypothesis, we sample 
from the following asymmetric logistic d.f. with Frechet marginals: 


F a (x,y) = exp< - 


(36) 


1 i 


+ 


l d ) 2 


1 + y y (1 + x) 2 (1 + y) 2 _ 

(x,y) e ( 1 , oo) 2 , 

with (j) = 0.25. This d.f. has the tail copula 


R(x, y) = x + 4>y - yj x 2 + ( 4>y ) 2 , (x, y) G [0, oo) 2 . 


6.2. Simulation results. From each generated sample, the empirical pro¬ 
cess W n {\5,x\ x [<5, y]) of (28) is computed on a 200 x 200 grid Q of uniform 
mesh length spanned over [5, r] 2 , with 5 = 0.001 and r = 1.001. We take 
k = 250 and T = 2 for all computations. The estimators 7 j and a,j, j = 1,2, 
are taken to be the moment estimators [see, e.g., de Haan and Ferreira 
(2006), Sections 4.2 and 3.5], and we set as usual bi = X n _k:m ^2 = Yn-k-.n , 
with Xj :n , Yj :n denoting the marginal order statistics. To estimate the param¬ 
eters 6 and i/j of Models 2 and 3, we use the method of moments estimator 
described in Einmahl, Krajina and Segers (2008), with auxiliary function 
g=l. 

To compare the process W n to a standard Wiener process, three test 
statistics are computed from each path of W n . These are: 

K n = max | W n ([5, x\ x [5, y \) | (Kolmogorov-Smirnov type), 

(x,y)€S 

ii^ii 2 E W n ([5,x] x [5, y]) 2 (Cramer-von Mises type), 

(x,y) eS 

Ilf?II 2 ^ x l^jD (Anderson-Darling type), 

{x,y)&G 

where ||f/|| denotes the mesh length of the grid Q, that is, 1/200. To create 
benchmark distribution tables for these statistics, we also simulate 10,000 
true standard Wiener process paths on the grid Q, and we compute the same 
test statistics for each path. We denote these statistics, computed from the 
true standard Wiener process, by k, uj 2 and A 2 . In view of the asymptotically 
distribution-free nature of our approach, these benchmark tables need to be 
produced only once. 


ojI = 


A 2 - 
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Model 1 Model 2 Model 3 



Empirical cdf of k„ Empirical cdf of k„ Empirical cdf of tc„ 



Empirical cdf of Empirical cdf of Empirical cdf of 



Empirical cdf of Aj; Empirical cdf of Empirical cdf of A® 

Fig. 1. PP-plots for the Kolmogorov-Smirnov, Cramer-von Mises and Anderson-Dar¬ 
ling type test statistics. 


For the 300 values of n n , ut 2 and A\ computed from the null hypoth¬ 
esis samples, we construct PP-plots to compare their empirical d.f.’s with 
the empirical d.f.’s of k, u 2 and A 2 , respectively. The results are shown 
in Figure 1. We see a good match of empirical d.f.’s for all three models, 
which shows that Theorem 4.1 yields good finite-sample approximations. 
This is also confirmed by the empirical size table given in the left panel of 
Table 1, where the observed fractions of Type I errors at the 5% significance 
level are shown. Note that these numbers are consistent with draws from a 
Binomial(300,0.05) distribution. 
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Table 1 

Observed rejection frequencies at the 5% significance level under null and alternative 

hypotheses 




Null 



Alternative 


Model 1 

Model 2 

Model 3 

Model 1 

Model 2 

Model 3 

Kn 

15/300 

19/300 

9/300 

97/100 

92/100 

97/100 


16/300 

11/300 

13/300 

99/100 

90/100 

97/100 


21/300 

17/300 

18/300 

100/100 

95/100 

100/100 


For the 100 values of the test statistics computed under each alternative 
hypothesis, we present the observed fraction of rejections at the 5% signif¬ 
icance level in the right panel of Table 1. All three tests have quite high 
power. 

7. Proofs. 


Proof of Theorem 3.1. First note that the terms following Vr(B) in 
(24) are “annihilated” by the transformation (25): 

Q T (T?)Z- l L Q T (BnA du )I-\A c u ) fj q(x,y)q T (x,y)df?(x,y)Z 

“0 J J 

= Q t (B)Z- [ Q T (Bn4 du )Z = 0. 

Jo 

Thus we can now compute, for Borel sets B,B' C [S,T] 2 , 
Coy[W r (B)W r (B')} 


= E 


Vr(B) — / Q (Bn A du )I~\A c u ) / / c q(s, y) dV R (x, y) 

JO J JA^. 


Vr(B') - I Q T (B / nA dul )I~ 1 (A c u ,) 11 d(x,y)dV R (x,y ) 

A c , 


= R(BnB') 

- / 1 Q T (BnA da )r 1 (^)Q(B'n^) 

Jo 

- f i Q T (B'nA du ,)I-\A c u ,)Q(BnA c u ,) 

Jo 

+ f 1 f 1 Q T (B n A du )l~ l (Al i )l{Al Wu ,)l~ l (A c u ,)(^{B' n A du >). 
Jo Jo 











DISTRIBUTION-FREE GOF TESTING FOR TAIL COPULAS 


21 


Splitting the double integral into two double integrals, one over the region 
{u < u'} and the other over the region {v! < u}, we see that all the integral 
terms cancel each other. This implies that Wr has the covariance structure 
of a Wiener process with time R. □ 

Let W Uy R denote the empirical version of Wr in Corollary 3.2, 

W ntR ([S,x] x [5,y]) 

= Vn{[S,x] x [5,y]) 

~ Is Is I I qfa'j*') <*%(«', t 7 )) dR d (s,t). 

The following result will be useful for the proof of Theorem 4.1. 


Proposition 7.1. Let 0 < 5 <t <T. If assumptions B1-B5 hold, then 
W nyR ([5,x\ x [S,y])^W R ([5,x] x [5,y]) 

in D([5, t] 2 ). 


Proof. Applying Skorohod’s representation theorem [see, e.g., Billings¬ 
ley (1999), Theorem 6.7] to Theorem 2.2, we obtain a probability space that 
supports probabilistically equivalent versions of fj n and rj satisfying 


WVn ^ll[<5,T] 2 1 0 a.s., 

with |M|[ ai b ]2 := sup/ a , )2/ ) e r ajb ]2 \<p(x,y)\. We will work on this space. Let us 
denote 


T r T 


(37) 


H(s,t) = q (s,t)I 1 {t) J^ J q (s',t') drj(s',t'), 

H(s,t) = q T (s,t)I~ 1 (t) [ [ q (s',t’) drj(s',t’), 

■Js Jt 

H n {s,t) = q T (s,t)I _ 1 (t) [ [ q (s',t’) drj n (s',t'), 

JS Jt 

H n (s,t) =q T (s,t)I _ 1 (t) [ ( q(s',t')drj n (s',t’). 

Js Jt 

We have to show that 


n y 

(■ H n (s,t)r-fi(s,t ) - H(s,t)r eo (s,t))dtds 
For this, it suffices to prove the two statements 


(38) sup 

{x,y)&[S,T } 2 


0 . 
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The first convergence in (39) follows from the continuity of rg(s,t) over 
( s,t,6 ) € [<5,T] 2 x B(Oq) and the continuity of H(s,t) over (s,t) € [<5,r] 2 . 
The second convergence in (39) follows from 


(40) \\H n — H II [<5,-t-] 2 —i► 0, 

since ||rg||u T i 2 = Op{ 1). We establish (40) by proving the two statements 

(41) \\H n — -ffnll^r] 2 —>> 0, \\H n — iif||[^ r ]2 —> 0. 

Consider the second statement in (41). Its left-hand side is equal to 

rT r T 


sup 

(s,t)e[5,r] 2 


q T (s,t)I 1 (t) [ ( q (s',t')dA n (s',t' 
Js Jt 


with A n — Tj n — fj. The vector function q T (s,t)I x (t) is bounded on (s,t) € 
[<5, r] 2 , by continuity. So it will suffice to show 


(42) 


sup 

te[<5,r] 


T rT 


j;l 


qi(s',t')dA n (s',t') 


0 , 


i = 1,..., 6 + d. 


The double integral inside the absolute value bars can be rewritten, using 
integration by parts [see Hildebrandt (1963), Section III.8], as follows: 


®(T, T)A n (T, T ) - qi (T, t)A n (T, t) - qi {6, T)A n (5, T) + qi (S, t)A n (5, t) 

- f A n (s',T)d qi (s',T) + f A„(s',t)dg' i (s , ,t) 

Js Js 

- A n (T,t')d qi (T,t') + Jt A n (5,t')d qi (5,l/) 

+ [ f A n (s',t')d qi (s',t'). 

Js Jt 


Each of the first four terms is bounded in absolute value by H^H^t] 2 ■ 
||A n ||[, 5 )T ] 2 , where the first factor is finite by continuity and the second 
factor vanishes in probability. Moreover, each integral term is bounded in 
absolute value by || qfi ||hk|||] r^TI 2 , which also vanishes in probability be¬ 
cause 1111hk < OO) by virtue of the assumptions \\pj(x,y, #o)||hk < oo for 
j = 1,..., 2 + d, and Proposition 1 of Bliimlinger and Tichy (1989). Hence 
(42) follows, and the second convergence in (41) is established. 

It remains to prove the first convergence in (41). By virtue of the second 
convergence there, and an analogous result for H n and H, it will suffice to 

^ p 

prove ||H — -fT|| [< 5 ,t -]2 — > 0. Note that 
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(43) <|q T (s,i)I 1 {t) -q T (s,f)I 1 (t)\- 


T r T 



5 J t 




+ |q T (s,t)I 1 (t) | 


[ f (q(s', t') - q(s', t')) drj(s, t') 
Js Jt 


where | • | should be interpreted component-wise. 

Let us write q(s,t,zi,Z2,wi,... ,Wd) to denote the vector q(s,t) with the 
values of 71 and 72 replaced by variables z\ and , and the values #oi,..., #od 
replaced by variables w\,..., Wd- Then q(s, t) = q(s, t, 71,72, #01 j ■ • • , #0 d) and 
q(s,f) = q(s,f, 71,72,6*1,..., 6» d ). 

Now consider the first term on the right-hand side of (43). Since the 
vector q(s,t,zi,Z 2 ,w\,... ,Wd) is continuous over [<5,r] 2 x M 2 x B(9 0 ), we 
have that |q T (s,t)I -1 (t) — q T (s,f)I _1 (t)| is op(l) uniformly over (s,t) € 
[5, r] 2 . Moreover, an integration by parts argument as above yields that 



s', t') dr}(s', 0 


< ll?ll[5,T]2 • (‘i||%||[5 j T]2 + ||hk) 


for 1 < i < 6 + d, where the right-hand side is Op(l). We conclude that the 
first term on the right-hand side of (43) is op(l) uniformly over (s, t) € [5, r] 2 . 
Next, consider the second term on the right-hand side of (43). It follows 

from the discussion above that the vector |q T (s,t)I _ 1 (i)| is Op(l) uniformly 
over (s, t) € [5, r] 2 , so it will suffice to show that 


(44) 


sup 

te[<5,r] 



A qi{s',t') d f}(s',t') 


0 , 


with A qi = qi — qi, for i = 1,..., 6 + d. Once again, an integration by parts 
argument shows that the left-hand side of (44) is bounded from above by 


11^11 [<5,T] 2 • m^Qi\\[S,T] 2 + 5||A%||hk), 

where ||??||[yr ]2 <00 a.s. and ||Ag ? ;||[ 5T ]2 = op(l) by continuity. It remains 
to establish || A^||hk = op(l). For i = 7,..., 6 + d, this follows directly from 
assumption B5. For i = 1, we have 

IIA</i||hk = \\fi(x,li)pi(x,y,9) - h{x^i)pi{.x,y,e 0 ) + A/((x)|| HK 

< \\Afi(x)pi(x,y, 0 o )||hk + ||/i(x,7i)Api(a;,7/)||HK + 2I/ (1) (A/(). 

Using Proposition 1 of Bliimlinger and Tichy (1989), differentiability prop¬ 
erties of / 1 , f[ on [5, T] and assumption B5, each term on the right-hand 
side can be shown to be op(l). The cases i = 2,... ,6 are similar. Thus (44) 
follows. □ 
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Proof of Theorem 4.1. Note that we have 

W n ([5,x\ x [<5,y]) = [ f a(s,t,6)dW n , R ([5,s] x [5,t]), 

Js Js 

W([5,x\x[5,y])= [ [ a(s,ti9 0 )dW R ([6,s] x [S,t]). 

Js Js 

Now, by Proposition 7.1 and Skorohod’s representation theorem, there exists 
a probability space supporting versions of W n , R and W R which satisfy 

sup \W n)R ([5,x\ x [8,y]) -W r ([6,x] x [<5,y])| 

(x,y)e[S,r] 2 


sup \D n (x,y)\—>0 a.s. 
0 x,y)£[S,T] 2 


We work with this probability space. We have 
I W n {[6,x] x [5,y])-W{[5,x] x [<5,y])| 

(45) 


< 


[ [ Acr(s,t)dWp([<5,s] x [S,t]) 
Js Js 


+ 


x ry 



a(s,t,0)dD n (s,t) 


5 Js 


Applying integration by parts as in the proof of Proposition 7.1, we see that 
the first term on the right-hand side of (45) is bounded by 

(46) sup |Wp([<5,s] x [8,t])\ • (4 ||Ao-|| [5 )T ]2+5 ||Ao-||hk). 

(s,t)e[5,r] 2 


Since W R is a.s. bounded on [<5,r] 2 , ||A<t||[ 5 t ]2 = op(l) by continuity, and 
||Act||hk = op(l) by assumption B5, (46) vanishes in probability. Similarly, 
the second term on the right-hand side of (45) is bounded by 

ll^n||[5,r] 2 ’ (4||cr(v,0)|| [ 5)T ]2 +5||cr(-,-,0)|| HK ), 

which also vanishes in probability since ||i^n||[i5,r] 2 = op(l) and the two sum¬ 
mands in the parentheses are Op(l). Thus the left-hand side of (45) is op(l) 
uniformly over (s,t) € [<5, r] 2 . □ 
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SUPPLEMENTARY MATERIAL 

Supplement to “Asymptotically distribution-free goodness-of-fit testing 
for tail copulas” (DOI: 10.1214/14-AOS1304SUPP; .pdf). We provide a 
proof of Theorem 2.1 as well as details about the Monte Carlo simulations 
of Section 6. 






DISTRIBUTION-FREE GOF TESTING FOR TAIL COPULAS 


25 


REFERENCES 

Beirlant, J., Goegebeur, Y., Teugels, J. and Segers, J. (2004). Statistics of Ex¬ 
tremes: Theory and Applications. Wiley, Chichester. MR2108013 
Billingsley, P. (1999). Convergence of Probability Measures, 2nd ed. Wiley, New York. 
MR1700749 

Blumlinger, M. and Tichy, R. F. (1989). Topological algebras of functions of bounded 
variation. I. Manuscripta Math. 65 245-255. MR1011435 
Can, S., Einmahl, J. J., Iviimaladze, E. V. and Laeven, R. A. (2015). Supple¬ 
ment to “Asymptotically distribution-free goodness-of-fit testing for tail copulas.” 
DOL10.1214/14-AOS1304SUPP. 

Coles, S. C. and Tawn, J. A. (1991). Modelling extreme multivariate events. J. Roy. 
Statist. Soc. Ser. B 53 377-392. MR1108334 

de Haan, L. and Ferreira, A. (2006). Extreme Value Theory: An Introduction. Springer, 
New York. MR2234156 

de Haan, L., Neves, C. and Peng, L. (2008). Parametric tail copula estimation and 
model testing. J. Multivariate Anal. 99 1260-1275. MR2419346 
de Haan, L. and Resnick, S. 1. (1977). Limit theory for multivariate sample extremes. 

Z. Wahrsch. Verw. Gebiete 40 317-337. MR0478290 
de Haan, L. and Resnick, S. 1. (1993). Estimating the limit distribution of multivariate 
extremes. Comm. Statist. Stochastic Models 9 275-309. MR1213072 
Delgado, M. A., Hidalgo, J. and Velasco, C. (2005). Distribution free goodness-of-fit 
tests for linear processes. Ann. Statist. 33 2568-2609. MR2253096 
Dette, H. and Hetzler, B. (2009). Khmaladze transformation of integrated variance 
processes with applications to goodness-of-fit testing. Math. Methods Statist. 18 97-116. 
MR2537360 

Einmaiil, J. H. J., de Haan, L. and Li, D. (2006). Weighted approximations of tail 
copula processes with application to testing the bivariate extreme value condition. Ann. 
Statist. 34 1987-2014. MR2283724 

Einmaiil, J. H. J., de Haan, L. and Sinha, A. K. (1997). Estimating the spectral mea¬ 
sure of an extreme value distribution. Stochastic Process. Appl. 70 143-171. MR1475660 
Einmaiil, J. H. J., Krajina, A. and Segers, J. (2008). A method of moments estimator 
of tail dependence. Bernoulli 14 1003-1026. MR2543584 
Einmahl, J. H. J., Krajina, A. and Segers, J. (2012). An JU-estimator for tail depen¬ 
dence in arbitrary dimensions. Ann. Statist. 40 1764-1793. MR3015043 
Hardy, G. H. (1905). On double Fourier series, and especially those which represent the 
double zeta-function with real and incommensurable parameters. Quart. J. Math. 37 
53-79. 

Hildebrandt, T. H. (1963). Introduction to the Theory of Integration. Academic Press, 
New York. MR0154957 

Huang, X. (1992). Statistics of bivariate extreme values. Ph.D. thesis, Univ. Rotterdam. 
Joe, H., Smith, R. L. and Weissman, I. (1992). Bivariate threshold methods for ex¬ 
tremes. J. Roy. Statist. Soc. Ser. B 54 171-183. MR1157718 
Khmaladze, E. V. (1981). A martingale approach in the theory of goodness-of-fit tests. 

Teor. Veroyatnost. i Primenen. 26 246-265. MR0616619 
Khmaladze, E. V. (1988). An innovation approach to goodness-of-fit tests in R m . Ann. 
Statist. 16 1503-1516. MR0964936 

Khmaladze, E. V. (1993). Goodness of fit problem and scanning innovation martingales. 
Ann. Statist. 21 798-829. MR1232520 

Khmaladze, E. V. and Koul, H. L. (2004). Martingale transforms goodness-of-fit tests 
in regression models. Ann. Statist. 32 995-1034. MR2065196 


26 


CAN, EINMAHL, KHMALADZE AND LAEVEN 


Khmaladze, E. V. and Koul, H. L. (2009). Goodness-of-fit problem for errors in 
nonparametric regression: Distribution free approach. Ann. Statist. 37 3165-3185. 
MR2549556 

Koenker, R. and Xiao, Z. (2002). Inference on the quantile regression process. Econo- 
metrica 70 1583-1612. MR1929979 

Koenker, R. and Xiao, Z. (2006). Quantile autoregression. J. Amer. Statist. Assoc. 101 
980-990. MR2324109 

Kotz, S. and Nadarajah, S. (2000). Extreme Value Distributions: Theory and Applica¬ 
tions. Imperial College Press, London. MR1892574 
Koul, H. L. and Swordson, E. (2011). Khmaladze transformation. In International 
Encyclopedia of Statistical Science (M. Lovric, ed.) 715-718. Springer, Berlin. 
Krause, M. (1903). Fouriersche Reihen mit zwei veranderlichen Grossen. Ber. Sachs. 
Akad. Wiss. Leipzig 55 164-197. 

McKeague, I. W., Nikabadze, A. M. and Sun, Y. Q. (1995). An omnibus test for 
independence of a survival time from a covariate. Ann. Statist. 23 450-475. MR1332576 
Nikabadze, A. and Stute, W. (1997). Model checks under random censorship. Statist. 
Probab. Lett. 32 249-259. MR1440835 

Owen, A. B. (2005). Multidimensional variation for quasi-Monte Carlo. In Contemporary 
Multivariate Analysis and Design of Experiments. Ser. Biostat. 2 49-74. World Sci. 
Publ., Hackensack, NJ. MR2271076 

Stute, W., Thies, S. and Zhu, L.-X. (1998). Model checks for regression: An innovation 
process approach. Ann. Statist. 26 1916-1934. MR1673284 


S. U. Can 
R. J. A. Laeven 

Faculty of Economics & Business 
Section Actuarial Science 
University of Amsterdam 
Valckenierstraat 65 
1018 XE Amsterdam 
The Netherlands 
E-mail: s.u.can@uva.nl 

r.j.a.laeven@uva.nl 

E. V. Khmaladze 
School of Mathematics, Statistics & OR 
Victoria University of Wellington 
P.O. Box 600 
Wellington 
New Zealand 

E-mail: estate.khmaladze@vuw.ac.nz 


J. H. J. Einmahl 

Department of Econometrics & OR 
and CentER 
Tilburg University 
P.O. Box 90153 
5000 LE Tilburg 
The Netherlands 

E-mail: j. h.j .einmahl@tilburguniversity.edu 


